Permalink
Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upre2/doc/syntax.txt
Go to fileChange-Id: Iff57514e09d6e4e141384dd0cf138314eb1435f1 Reviewed-on: https://code-review.googlesource.com/c/re2/+/53330 Reviewed-by: Paul Wankadia <junyer@google.com>
| RE2 regular expression syntax reference | |
| ------------------------------------- | |
| Single characters: | |
| . any character, possibly including newline (s=true) | |
| [xyz] character class | |
| [^xyz] negated character class | |
| \d Perl character class | |
| \D negated Perl character class | |
| [[:alpha:]] ASCII character class | |
| [[:^alpha:]] negated ASCII character class | |
| \pN Unicode character class (one-letter name) | |
| \p{Greek} Unicode character class | |
| \PN negated Unicode character class (one-letter name) | |
| \P{Greek} negated Unicode character class | |
| Composites: | |
| xy «x» followed by «y» | |
| x|y «x» or «y» (prefer «x») | |
| Repetitions: | |
| x* zero or more «x», prefer more | |
| x+ one or more «x», prefer more | |
| x? zero or one «x», prefer one | |
| x{n,m} «n» or «n»+1 or ... or «m» «x», prefer more | |
| x{n,} «n» or more «x», prefer more | |
| x{n} exactly «n» «x» | |
| x*? zero or more «x», prefer fewer | |
| x+? one or more «x», prefer fewer | |
| x?? zero or one «x», prefer zero | |
| x{n,m}? «n» or «n»+1 or ... or «m» «x», prefer fewer | |
| x{n,}? «n» or more «x», prefer fewer | |
| x{n}? exactly «n» «x» | |
| x{} (== x*) NOT SUPPORTED vim | |
| x{-} (== x*?) NOT SUPPORTED vim | |
| x{-n} (== x{n}?) NOT SUPPORTED vim | |
| x= (== x?) NOT SUPPORTED vim | |
| Implementation restriction: The counting forms «x{n,m}», «x{n,}», and «x{n}» | |
| reject forms that create a minimum or maximum repetition count above 1000. | |
| Unlimited repetitions are not subject to this restriction. | |
| Possessive repetitions: | |
| x*+ zero or more «x», possessive NOT SUPPORTED | |
| x++ one or more «x», possessive NOT SUPPORTED | |
| x?+ zero or one «x», possessive NOT SUPPORTED | |
| x{n,m}+ «n» or ... or «m» «x», possessive NOT SUPPORTED | |
| x{n,}+ «n» or more «x», possessive NOT SUPPORTED | |
| x{n}+ exactly «n» «x», possessive NOT SUPPORTED | |
| Grouping: | |
| (re) numbered capturing group (submatch) | |
| (?P<name>re) named & numbered capturing group (submatch) | |
| (?<name>re) named & numbered capturing group (submatch) NOT SUPPORTED | |
| (?'name're) named & numbered capturing group (submatch) NOT SUPPORTED | |
| (?:re) non-capturing group | |
| (?flags) set flags within current group; non-capturing | |
| (?flags:re) set flags during re; non-capturing | |
| (?#text) comment NOT SUPPORTED | |
| (?|x|y|z) branch numbering reset NOT SUPPORTED | |
| (?>re) possessive match of «re» NOT SUPPORTED | |
| re@> possessive match of «re» NOT SUPPORTED vim | |
| %(re) non-capturing group NOT SUPPORTED vim | |
| Flags: | |
| i case-insensitive (default false) | |
| m multi-line mode: «^» and «$» match begin/end line in addition to begin/end text (default false) | |
| s let «.» match «\n» (default false) | |
| U ungreedy: swap meaning of «x*» and «x*?», «x+» and «x+?», etc (default false) | |
| Flag syntax is «xyz» (set) or «-xyz» (clear) or «xy-z» (set «xy», clear «z»). | |
| Empty strings: | |
| ^ at beginning of text or line («m»=true) | |
| $ at end of text (like «\z» not «\Z») or line («m»=true) | |
| \A at beginning of text | |
| \b at ASCII word boundary («\w» on one side and «\W», «\A», or «\z» on the other) | |
| \B not at ASCII word boundary | |
| \G at beginning of subtext being searched NOT SUPPORTED pcre | |
| \G at end of last match NOT SUPPORTED perl | |
| \Z at end of text, or before newline at end of text NOT SUPPORTED | |
| \z at end of text | |
| (?=re) before text matching «re» NOT SUPPORTED | |
| (?!re) before text not matching «re» NOT SUPPORTED | |
| (?<=re) after text matching «re» NOT SUPPORTED | |
| (?<!re) after text not matching «re» NOT SUPPORTED | |
| re& before text matching «re» NOT SUPPORTED vim | |
| re@= before text matching «re» NOT SUPPORTED vim | |
| re@! before text not matching «re» NOT SUPPORTED vim | |
| re@<= after text matching «re» NOT SUPPORTED vim | |
| re@<! after text not matching «re» NOT SUPPORTED vim | |
| \zs sets start of match (= \K) NOT SUPPORTED vim | |
| \ze sets end of match NOT SUPPORTED vim | |
| \%^ beginning of file NOT SUPPORTED vim | |
| \%$ end of file NOT SUPPORTED vim | |
| \%V on screen NOT SUPPORTED vim | |
| \%# cursor position NOT SUPPORTED vim | |
| \%'m mark «m» position NOT SUPPORTED vim | |
| \%23l in line 23 NOT SUPPORTED vim | |
| \%23c in column 23 NOT SUPPORTED vim | |
| \%23v in virtual column 23 NOT SUPPORTED vim | |
| Escape sequences: | |
| \a bell (== \007) | |
| \f form feed (== \014) | |
| \t horizontal tab (== \011) | |
| \n newline (== \012) | |
| \r carriage return (== \015) | |
| \v vertical tab character (== \013) | |
| \* literal «*», for any punctuation character «*» | |
| \123 octal character code (up to three digits) | |
| \x7F hex character code (exactly two digits) | |
| \x{10FFFF} hex character code | |
| \C match a single byte even in UTF-8 mode | |
| \Q...\E literal text «...» even if «...» has punctuation | |
| \1 backreference NOT SUPPORTED | |
| \b backspace NOT SUPPORTED (use «\010») | |
| \cK control char ^K NOT SUPPORTED (use «\001» etc) | |
| \e escape NOT SUPPORTED (use «\033») | |
| \g1 backreference NOT SUPPORTED | |
| \g{1} backreference NOT SUPPORTED | |
| \g{+1} backreference NOT SUPPORTED | |
| \g{-1} backreference NOT SUPPORTED | |
| \g{name} named backreference NOT SUPPORTED | |
| \g<name> subroutine call NOT SUPPORTED | |
| \g'name' subroutine call NOT SUPPORTED | |
| \k<name> named backreference NOT SUPPORTED | |
| \k'name' named backreference NOT SUPPORTED | |
| \lX lowercase «X» NOT SUPPORTED | |
| \ux uppercase «x» NOT SUPPORTED | |
| \L...\E lowercase text «...» NOT SUPPORTED | |
| \K reset beginning of «$0» NOT SUPPORTED | |
| \N{name} named Unicode character NOT SUPPORTED | |
| \R line break NOT SUPPORTED | |
| \U...\E upper case text «...» NOT SUPPORTED | |
| \X extended Unicode sequence NOT SUPPORTED | |
| \%d123 decimal character 123 NOT SUPPORTED vim | |
| \%xFF hex character FF NOT SUPPORTED vim | |
| \%o123 octal character 123 NOT SUPPORTED vim | |
| \%u1234 Unicode character 0x1234 NOT SUPPORTED vim | |
| \%U12345678 Unicode character 0x12345678 NOT SUPPORTED vim | |
| Character class elements: | |
| x single character | |
| A-Z character range (inclusive) | |
| \d Perl character class | |
| [:foo:] ASCII character class «foo» | |
| \p{Foo} Unicode character class «Foo» | |
| \pF Unicode character class «F» (one-letter name) | |
| Named character classes as character class elements: | |
| [\d] digits (== \d) | |
| [^\d] not digits (== \D) | |
| [\D] not digits (== \D) | |
| [^\D] not not digits (== \d) | |
| [[:name:]] named ASCII class inside character class (== [:name:]) | |
| [^[:name:]] named ASCII class inside negated character class (== [:^name:]) | |
| [\p{Name}] named Unicode property inside character class (== \p{Name}) | |
| [^\p{Name}] named Unicode property inside negated character class (== \P{Name}) | |
| Perl character classes (all ASCII-only): | |
| \d digits (== [0-9]) | |
| \D not digits (== [^0-9]) | |
| \s whitespace (== [\t\n\f\r ]) | |
| \S not whitespace (== [^\t\n\f\r ]) | |
| \w word characters (== [0-9A-Za-z_]) | |
| \W not word characters (== [^0-9A-Za-z_]) | |
| \h horizontal space NOT SUPPORTED | |
| \H not horizontal space NOT SUPPORTED | |
| \v vertical space NOT SUPPORTED | |
| \V not vertical space NOT SUPPORTED | |
| ASCII character classes: | |
| [[:alnum:]] alphanumeric (== [0-9A-Za-z]) | |
| [[:alpha:]] alphabetic (== [A-Za-z]) | |
| [[:ascii:]] ASCII (== [\x00-\x7F]) | |
| [[:blank:]] blank (== [\t ]) | |
| [[:cntrl:]] control (== [\x00-\x1F\x7F]) | |
| [[:digit:]] digits (== [0-9]) | |
| [[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]) | |
| [[:lower:]] lower case (== [a-z]) | |
| [[:print:]] printable (== [ -~] == [ [:graph:]]) | |
| [[:punct:]] punctuation (== [!-/:-@[-`{-~]) | |
| [[:space:]] whitespace (== [\t\n\v\f\r ]) | |
| [[:upper:]] upper case (== [A-Z]) | |
| [[:word:]] word characters (== [0-9A-Za-z_]) | |
| [[:xdigit:]] hex digit (== [0-9A-Fa-f]) | |
| Unicode character class names--general category: | |
| C other | |
| Cc control | |
| Cf format | |
| Cn unassigned code points NOT SUPPORTED | |
| Co private use | |
| Cs surrogate | |
| L letter | |
| LC cased letter NOT SUPPORTED | |
| L& cased letter NOT SUPPORTED | |
| Ll lowercase letter | |
| Lm modifier letter | |
| Lo other letter | |
| Lt titlecase letter | |
| Lu uppercase letter | |
| M mark | |
| Mc spacing mark | |
| Me enclosing mark | |
| Mn non-spacing mark | |
| N number | |
| Nd decimal number | |
| Nl letter number | |
| No other number | |
| P punctuation | |
| Pc connector punctuation | |
| Pd dash punctuation | |
| Pe close punctuation | |
| Pf final punctuation | |
| Pi initial punctuation | |
| Po other punctuation | |
| Ps open punctuation | |
| S symbol | |
| Sc currency symbol | |
| Sk modifier symbol | |
| Sm math symbol | |
| So other symbol | |
| Z separator | |
| Zl line separator | |
| Zp paragraph separator | |
| Zs space separator | |
| Unicode character class names--scripts: | |
| Adlam | |
| Ahom | |
| Anatolian_Hieroglyphs | |
| Arabic | |
| Armenian | |
| Avestan | |
| Balinese | |
| Bamum | |
| Bassa_Vah | |
| Batak | |
| Bengali | |
| Bhaiksuki | |
| Bopomofo | |
| Brahmi | |
| Braille | |
| Buginese | |
| Buhid | |
| Canadian_Aboriginal | |
| Carian | |
| Caucasian_Albanian | |
| Chakma | |
| Cham | |
| Cherokee | |
| Chorasmian | |
| Common | |
| Coptic | |
| Cuneiform | |
| Cypriot | |
| Cyrillic | |
| Deseret | |
| Devanagari | |
| Dives_Akuru | |
| Dogra | |
| Duployan | |
| Egyptian_Hieroglyphs | |
| Elbasan | |
| Elymaic | |
| Ethiopic | |
| Georgian | |
| Glagolitic | |
| Gothic | |
| Grantha | |
| Greek | |
| Gujarati | |
| Gunjala_Gondi | |
| Gurmukhi | |
| Han | |
| Hangul | |
| Hanifi_Rohingya | |
| Hanunoo | |
| Hatran | |
| Hebrew | |
| Hiragana | |
| Imperial_Aramaic | |
| Inherited | |
| Inscriptional_Pahlavi | |
| Inscriptional_Parthian | |
| Javanese | |
| Kaithi | |
| Kannada | |
| Katakana | |
| Kayah_Li | |
| Kharoshthi | |
| Khitan_Small_Script | |
| Khmer | |
| Khojki | |
| Khudawadi | |
| Lao | |
| Latin | |
| Lepcha | |
| Limbu | |
| Linear_A | |
| Linear_B | |
| Lisu | |
| Lycian | |
| Lydian | |
| Mahajani | |
| Makasar | |
| Malayalam | |
| Mandaic | |
| Manichaean | |
| Marchen | |
| Masaram_Gondi | |
| Medefaidrin | |
| Meetei_Mayek | |
| Mende_Kikakui | |
| Meroitic_Cursive | |
| Meroitic_Hieroglyphs | |
| Miao | |
| Modi | |
| Mongolian | |
| Mro | |
| Multani | |
| Myanmar | |
| Nabataean | |
| Nandinagari | |
| New_Tai_Lue | |
| Newa | |
| Nko | |
| Nushu | |
| Nyiakeng_Puachue_Hmong | |
| Ogham | |
| Ol_Chiki | |
| Old_Hungarian | |
| Old_Italic | |
| Old_North_Arabian | |
| Old_Permic | |
| Old_Persian | |
| Old_Sogdian | |
| Old_South_Arabian | |
| Old_Turkic | |
| Oriya | |
| Osage | |
| Osmanya | |
| Pahawh_Hmong | |
| Palmyrene | |
| Pau_Cin_Hau | |
| Phags_Pa | |
| Phoenician | |
| Psalter_Pahlavi | |
| Rejang | |
| Runic | |
| Samaritan | |
| Saurashtra | |
| Sharada | |
| Shavian | |
| Siddham | |
| SignWriting | |
| Sinhala | |
| Sogdian | |
| Sora_Sompeng | |
| Soyombo | |
| Sundanese | |
| Syloti_Nagri | |
| Syriac | |
| Tagalog | |
| Tagbanwa | |
| Tai_Le | |
| Tai_Tham | |
| Tai_Viet | |
| Takri | |
| Tamil | |
| Tangut | |
| Telugu | |
| Thaana | |
| Thai | |
| Tibetan | |
| Tifinagh | |
| Tirhuta | |
| Ugaritic | |
| Vai | |
| Wancho | |
| Warang_Citi | |
| Yezidi | |
| Yi | |
| Zanabazar_Square | |
| Vim character classes: | |
| \i identifier character NOT SUPPORTED vim | |
| \I «\i» except digits NOT SUPPORTED vim | |
| \k keyword character NOT SUPPORTED vim | |
| \K «\k» except digits NOT SUPPORTED vim | |
| \f file name character NOT SUPPORTED vim | |
| \F «\f» except digits NOT SUPPORTED vim | |
| \p printable character NOT SUPPORTED vim | |
| \P «\p» except digits NOT SUPPORTED vim | |
| \s whitespace character (== [ \t]) NOT SUPPORTED vim | |
| \S non-white space character (== [^ \t]) NOT SUPPORTED vim | |
| \d digits (== [0-9]) vim | |
| \D not «\d» vim | |
| \x hex digits (== [0-9A-Fa-f]) NOT SUPPORTED vim | |
| \X not «\x» NOT SUPPORTED vim | |
| \o octal digits (== [0-7]) NOT SUPPORTED vim | |
| \O not «\o» NOT SUPPORTED vim | |
| \w word character vim | |
| \W not «\w» vim | |
| \h head of word character NOT SUPPORTED vim | |
| \H not «\h» NOT SUPPORTED vim | |
| \a alphabetic NOT SUPPORTED vim | |
| \A not «\a» NOT SUPPORTED vim | |
| \l lowercase NOT SUPPORTED vim | |
| \L not lowercase NOT SUPPORTED vim | |
| \u uppercase NOT SUPPORTED vim | |
| \U not uppercase NOT SUPPORTED vim | |
| \_x «\x» plus newline, for any «x» NOT SUPPORTED vim | |
| Vim flags: | |
| \c ignore case NOT SUPPORTED vim | |
| \C match case NOT SUPPORTED vim | |
| \m magic NOT SUPPORTED vim | |
| \M nomagic NOT SUPPORTED vim | |
| \v verymagic NOT SUPPORTED vim | |
| \V verynomagic NOT SUPPORTED vim | |
| \Z ignore differences in Unicode combining characters NOT SUPPORTED vim | |
| Magic: | |
| (?{code}) arbitrary Perl code NOT SUPPORTED perl | |
| (??{code}) postponed arbitrary Perl code NOT SUPPORTED perl | |
| (?n) recursive call to regexp capturing group «n» NOT SUPPORTED | |
| (?+n) recursive call to relative group «+n» NOT SUPPORTED | |
| (?-n) recursive call to relative group «-n» NOT SUPPORTED | |
| (?C) PCRE callout NOT SUPPORTED pcre | |
| (?R) recursive call to entire regexp (== (?0)) NOT SUPPORTED | |
| (?&name) recursive call to named group NOT SUPPORTED | |
| (?P=name) named backreference NOT SUPPORTED | |
| (?P>name) recursive call to named group NOT SUPPORTED | |
| (?(cond)true|false) conditional branch NOT SUPPORTED | |
| (?(cond)true) conditional branch NOT SUPPORTED | |
| (*ACCEPT) make regexps more like Prolog NOT SUPPORTED | |
| (*COMMIT) NOT SUPPORTED | |
| (*F) NOT SUPPORTED | |
| (*FAIL) NOT SUPPORTED | |
| (*MARK) NOT SUPPORTED | |
| (*PRUNE) NOT SUPPORTED | |
| (*SKIP) NOT SUPPORTED | |
| (*THEN) NOT SUPPORTED | |
| (*ANY) set newline convention NOT SUPPORTED | |
| (*ANYCRLF) NOT SUPPORTED | |
| (*CR) NOT SUPPORTED | |
| (*CRLF) NOT SUPPORTED | |
| (*LF) NOT SUPPORTED | |
| (*BSR_ANYCRLF) set \R convention NOT SUPPORTED pcre | |
| (*BSR_UNICODE) NOT SUPPORTED pcre | |